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SUMMARY 


In  this  paper  we  study  estivation  of  the  parasieters  of  generalized 
linear  nodels  in  canonical  fora  when  the  explanatory  vector  is  measured 
with  independent  normal  error.  For  the  functional  case,  i.e.,  when  the 
explanatory  vectors  are  fixed  constants,  unbiased  score  functions  are 
obtained  by  conditioning  on  certain  sufficient  statistics.  This  work 
generalizes  results  obtained  by  the  authors  (Stefanski  &  Carroll,  1986)  for 
logistic  regression.  In  the  case  that  the  explanatory  vectors  are  indepen¬ 
dent  and  identically  distributed  with  unknown  distribution,  efficient  score 
functions  are  obtained  using  the  theory  developed  in  Begun  et  al.  (1981). 
Related  results  can  be  found  in  Bickel  &  Ritov  (198b). 


Some  key  words :  Conditional  score  function;  fcfficient  score  function; 
Functional  model;  Generalized  linear  model;  Measurement  error;  Structural 
model . 
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1 .  IMTIOOUCTIOB 


Given  a  covariate  p-vector  u  assume  that  Y  has  the  density 

hY(y;0,u)  •  exp|^°*P  U«(+jtt4^  +  c<y.#)j  <1.1) 

with  respect  to  a  o-finite  measure  m( ♦ ) ;  in  (1.1)  8^  •  (a,8^,$)  and 
a(*),b(")  and  c( • , • )  are  known  functions.  The  density  (1.1)  is  that  of  a 
generalized  linear  model  in  canonical  form  (McCullagh  &  Nelder,  1983,  Ch. 
2).  Suppose  now  that  u  cannot  be  observed  but  that  M  independent  measure¬ 
ments  X*(Xj , . . . ,X^)  of  u  are  available.  When  measurement  error  is 
normally  distributed  the  matrix  X  has  density 

**  (  T _ 1  ) 

h  (x;0,u)  ■  n  * - ? —  exp<-$(x  -  u)  fi  (x  -  u)>.  (1.2) 

j-1  |Q|*  <  J  i  ’ 

Together  (1.1)  and  (1.2)  define  a  generalized  linear  measurement-error 
model  with  normal  measurement  error.  If  for  a  sample  (Y^X^  (i«l,...,n) 
the  covariables  (u^)  are  unknown  constants,  a  functional  model  is  obtained; 
if  (u^)  are  independent  and  Identically  distributed  random  vectors  from 
some  unknown  distribution,  a  structural  model  is  obtained  (Kendall  & 

Stuart,  1979,  Chapter  29).  In  this  paper  the  problem  of  deriving  unbiased 
scores  for  8  in  both  functional  and  structural  models  is  studied. 

There  is  a  vast  literature  on  this  problem  in  the  special  case  that 
(1.1)  is  a  normal  density.  This  dates  back  to  Adcock  (1878)  and  has  been 
reviewed  by  Anderson  (1976);  see  also  Moran  (1971).  Recently  there  has 
been  considerable  interest  in  nonlinear  measurement-error  models;  see 
Prentice  (1982),  Wolter  &  Puller  (1982a,  1982b),  Carroll  at  aJ.  (1984), 
Stefanski  (1985)  and  Stefanski  &  Carroll  (1986). 


The  density  (1.1)  includes  noraal,  Poisson,  logistic  and  gamut 
regression  Models.  The  key  feature  these  models  have  in  coamon  la  the 


existence  of  a  natural  sufficient  statistic  for  u  when  all  other  parameters 
are  fixed.  The  sane  is  true  of  the  noraal  denaity  in  (1.2).  In  fact  (1.2) 
could  be  replaced  with  any  density  possessing  a  natural  sufficient  statis¬ 
tic  for  u  when  other  parameters  are  fixed  and  much  of  the  following  theory 
holds  with  little  or  no  modification.  However,  in  the  framework  of 
measurement-error  aodels  no  other  assumption  on  the  error  distribution  is 
more  palatable  than  that  of  normality  and  thus  the  added  generality  is 
sacrificed  for  a  reduction  in  notational  complexity. 

In  Section  2  functional  models  are  studied  and  unbiased  score 
functions  for  estimating  0  in  the  presence  of  the  unknown  u^s  are 
presented.  This  work  generalizes  and  extends  results  of  Stefanski  & 

Carroll  (1986)  for  logistic  regression.  Structural  models  are  studied  in 
Section  3  and  efficient  score  functions  for  estimating  0  in  the  presence 
of  the  unknown  distribution  for  u  are  identified.  These  results  are 
obtained  using  the  theory  of  efficient  estimation  developed  by  Begun  et 
al.  (1983).  Other  work  in  this  area  includes  that  of  Bickel  &  Ritov 
(1986). 

In  the  case  that  the  covariates  u,,...,u  are  observed  without 

x  n 

*rror  the  maximum  likelihood  estimator  of  0  maximizes 

n 

1  log  h  (Y  ;0,u  ) 

i-1  iii 

with  respect  to  0.  Let  be  the  mean  of  the  M  measurements  of  u^;  that 
value  of  0  which  maximizes 

n 

2  log  h  (Y  ; 0 , X  ) 

i*l  ill 

will  be  referred  to  as  the  naive  estimator.  This  estimator  is  usually 
inconsistent  (Stefanski,  1985)  although  when  0/M  is  small  its  bias  will 


be  small. 
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2.  FUHCTIOEAL  MODELS 

2 . 1  The  functional  likelihood 

Consider  the  functional  version  of  model  (1.1)  &  (1.2).  In  this 

section  the  case  M  *  1  is  studied  under  the  additional  assumption  that 

Q/a(+)  ■  Q  (known).  (2.1) 

Throughout  this  section  the  random  variables  (Y^.X^),  (i»l,...,n)  are 

independent  but  not  identically  distributed  since  their  distributions 

depend  on  the  true  regressors  u^,  which  vary  with  i.  However,  for  nota- 

tional  convenience  the  subscript  i  will  be  dropped  when  referring  to  (Y^.Xj) 

in  those  situations  where  it  causes  no  confusion.  Under  (1.1),  (1.2)  and 

(2.1)  the  joint  density  of  (Y,X)  takes  the  form 

hy  X(y,x;fi,u)  ■  hy(y;8,u)hx(x;8,u) .  (2.2) 

For  a  set  of  n  observations  the  log-likelihood  is 

n 

L(8,Uj ,. . . ,un>  ■  £  log{hy  x(Y1,X1;9,u1)}  (2.3) 

In  the  case  that  Y  is  normally  distributed  it  is  known  that  under  (2.1) 

T 

maximizing  (2.3)  with  respect  to  (e»,&  ,9,u,,...,u  )  results  in  consistent 

i  n 

estimators  of  the  regression  coefficients  a  and  B  (Gleser,  1981).  For  any 
model  other  than  the  normal,  the  task  of  maximizing  (2.3)  with  respect  to 
its  n+p+2  parameters  is  formidable  and  not  likely  to  be  undertaken.  More 
Importantly  it  is  not  generally  true  that  maximizing  (2.3)  produces 
consistent  estimators.  It  follows  from  results  in  the  first  author's 
University  of  North  Carolina  Ph.D.  thesis  that  in  the  case  of  logistic 
regression  the  functional  maximum  likelihood  estimator  of  (a, 8)  is  not 
consistent  under  assumption  (2.1);  see  also  Stefanski  &  Carroll  (1986). 

The  unwieldy  functional  likelihood,  and  its  failure  to  produce  consistent 
estimators  in  some  Important  cases  point  to  the  need  for  an  alternative 
theory  of  estimation  which  is  now  pursued. 


2.2  Unbiased  score  functions 


In  this  section  unbiased  score  functions  for  the  functional  model  are 
obtained  by  conditioning  on  certain  sufficient  statistics.  Note  that  (2.2) 
can  be  written  as 

hy  X<y,x;6,u)  -  q( 6 ,0 ,u)r( y ,x,0)  (2.4) 

where 


q(« ,6,u) 
r(y,x,0 ) 


•  exp 


•  exp 


(  u  Q  6 

I  «(♦) 

T  -1 

2oy-x  ft  x 
2a(<») 


u^fl  *u  +  2b(a+B^u) 
2a(0) 

+  C*( y ,$ ) j  ; 


4  ■  4(y,x,0)  “  x  +  y(2$; 

C  (y,<>)  -  c(  y ,  )  -  (|)logH2xa(0))Ptfi|  j . 


(2.5) 


Thus  viewing  u  as  a  parameter  and  q,  0  and  $  as  fixed,  the  statistic 

A  -  A(  Y ,  X ,  0  )  **  X  ♦  YQB  (2.6) 

is  sufficient  for  u.  As  a  consequence,  the  distribution  of  Y|A  depends 
only  on  the  observed  variables  Y  and  X  and  0,  but  not  on  u.  From  this 
conditional  distribution  it  is  possible  to  derive  unbiased  estimating 
equations  for  0  which  are  independent  of  u. 

Let  hy|^(y|<;0)  denote  the  conditional  di stribution  of  Y|A  ■  6.  To 
find  hyj^  note  that  the  Jacobian  of  the  transformation  which  takes  (Y,X> 
into  (Y.X+YQB)  has  a  determinant  of  one.  Thus 

pr(Y*y,  A"6)dm(y)d4  »  pr(Y»y,  X»4-yftB )dm( y )d6 
and  after  some  routine  calculatioi  s  one  finds 


where 


Y I  A 


<yl«;e) 


/J<  y,  6 , 0  )dm(  y ) 


(2.7) 
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d/s(y,x,e) 


^YU(yl<;9) 
hY|A(y|«;»)  6»x+yQB 


(2.12) 


it  follows  that  H>g(  *,*,*)  is  unbiased  for  8,  i.e., 

Ee<<l>s(Y,X,9)>  •  E9(£^s(Y,X,»)|aH  -  0. 

The  inner  conditional  expectation  is  tero  by  virtue  of  (2.11). 

The  score  ii-  will  be  called  the  sufficiency  score  and  any  estimator 
9S  »  (&s,0g,$s>T  which  satisfies 


I  *c(Y.,x,e)-o  (2.13; 

1-1  5  1  1  s 


will  be  called  a  sufficiency  estimator. 

\  . 

Consider  the  density  in  (2.4)  and  let  hv  „  —  (3/38)hv  .  Note  that 

I  |A  1  |  A 


where 


hy  X(y,x;6,u)  /hy  x(Y,X;8,u)  I 
hy  X(y,x;0,u)  “  E|hy  x(Y,X;8,u)  |A  “  6 


{y  -  E(Y|A-d))/a(9) 

{y  ~  E( Y| A-d )  }u/a(4) 

r  (y ,x,8)  -  E(r  ( Y,X,8) |A-df 

4  0 


^^(y.x.e) 


ac*(y,x,e) 

3$ 


(2«y-xTfl  *x| 
\  2o2(4)  I 


a'(4). 


As  the  expression  in  brackets  above  depends  on  the  unknown  covariate  u  only 
as  a  'weight'  this  suggests  the  class  of  score  functions 


^c<  y.x.O) 


"( y-E(  Y|A-6)}/a(4) 

(  y-E( y | A-d ) ) Qt ( d ) 

r  (y,X,e)-E{r  <Y,X,6)|A-d} 

0  9 


d-x+yflB 


(2.14) 


indexed  by  the  vector-valued  function  t(-).  The  score  (2.14)  will  be 
called  a  conditional  score  following  Lindsay  (1980,  1982,  1983).  Some 
natural  choices  for  t (d)  might  be  t(d)-d  and  t ( d )-E  ( X | A-d ) .  Note  that 

v 

since  X  is  unbiased  for  u  and  A  is  sufficient  for  u  the  latter  choice 
corresponds  to  replacing  u  by  its  uniformly  minimum  variance  unbiased 
estimator.  Also  since 

E0(X|A-d)  -  d  -  E0( Y | A-d )ft&  (2. IS) 

only  the  conditional  moments  of  Y|A  are  needed  to  find  E  (X|A-d).  More 

w 


will  he  said  on  appropriate  choices  for  t( ■ )  in  Section  3.3. 
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Any  estimator  6^  which  satisfies 

n 

I  *p(Y  ,X  ,e_)  -  0  (2.16) 

i-1 

will  be  called  a  conditional  estimator. 

The  estimating  equations  in  (2.12)  and  (2.14)  are  both  unbiased. 
Although  it  should  be  possible  to  show  that  under  reasonable  conditions 
there  always  exist  consistent  sequences  of  estimators  0  and  0 
satisfying  (2.13)  and  (2.16)  respectively,  it  is  not  generally  true  that 
(2.13)  and  (2.16)  define  0g  and  0C  uniquely.  More  importantly,  there 
can  exist  sequences  of  solutions  to  (2.13)  and  (2.16)  which  are  not  con¬ 
sistent,  and  thus  care  must  be  taken  when  defining  0  and  9  .  In 

D  v 

practice  a  couple  of  solutions  to  this  dilemma  are  possible.  The  first 
consists  of  defining  the  estimators  0^  and  @c  as  the  solutions  to 
(2.13)  and  (2.16)  which  are  closest  to  the  naive  estimator  introduced  in 
Section  1.  This  rule  is  justifiable  when  measurement  error  is  small, 
however  it  can  break  down  when  measurement  error  is  large.  This  is 
discussed  in  greater  detail  for  the  normal  model  in  the  next  section.  The 
second  solution  entails  doing  one  or  two  steps  of  a  Newton-Raphson  iter¬ 
ation  of  (2.13)  and  (2.16)  starting  from  the  naive  estimator.  Again  this  is 
gener^’W  appropriate  only  when  the  measurement  is  small.  However,  in  some 
realistic  sampling  situations,  Stefanski  &  Carroll  (1986)  show  that  such  an 
approach  substantially  improves  upon  the  naive  estimator  in  their  study  of 
measurement  error  in  logistic  regression.  Finally,  preliminary  work  by  the 
authors  suggests  that  it  is  possible  to  deconvolute  the  empirical  distri¬ 
bution  function  of  the  observed  X^'s  to  obtain  an  estimator  of  the  empiri¬ 
cal  distribution  function  of  the  u^'s,  which  under  regularity  conditions 
can  be  used  to  construct  consistent  estimators  for  the  functional  model. 
These  estimators  can  then  be  used  to  uniquely  define  the  more  manageable 


M-estimators ,  0<,  and  8^. 

When  consistent  sequences  of  solutions  to  (2.13)  and  (2.16)  are 
obtained  the  asymptotic  distributions  of  8^  and  8^  are  easily  derived 


since  both  are  M-estimators;  see  Huber  (1967) 


2.3  Normal,  logistic  and  Poisson  regression 
In  this  section  the  strengths  and  limitations  of  the  estimation  theory 
are  illustrated  by  studying  it  in  three  particular  generalized  linear 
models . 

Consider  first  the  case  in  which  Y  has  a  normal  distribution  with  mean 
T 

a  +  B  u  and  variance  o*  .  For  this  model  $  -  a2  ,  a(4>)  *  b  and  tn(  ■  )  is 

Lebesque  measure.  Using  (2.7)  one  finds  that  the  distribution  of  Y|A  -  6 

T 

is  normal  with  variance  a1 / ( 1  +  B  QB )  and  mean  p  where 


»  a  +  BT< 

1  +  BTfl8 

Corresponding  to  (2.12)  one  finds 


(2.17) 


i|t  (y,x,0) 


+  £2<y-|»> 


-  ~rj  ( y~M  > 1  KB— ( y-p )  <  S-2pftB  )  | 


-  uH  (  i»b  qb; 

2(7' 


I  £-x+yft8 


where  p  is  defined  in  (2.17).  Define 


A*  -  ( I+ft8BT )_1 <A( Y  ,Xt ,6 )-ofiB) 


where  A(  •  ,  *  ,  *  )  is  given  by  (2.6)  and  consider  the  equations 


^  T  ★  /  1  \ 

l  <V«-B  V  (  *  )  “  0 

i-1  v  A  ' 


1  +  BTflB  " . 

o'  -  -  >  (Y  -p)‘. 

n  i-1  1 


(2.18) 


laLkl.alL 
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It  is  a  simple  matter  to  show  that  every  solution  to  (2.18)  is  also  a 

solution  to  £<(>,,( Y^X^.S)  *  0,  i.e.,  any  solution  to  (2.18)  is  a  sufficiency 

estimator.  The  similarity  of  (2.18)  to  the  usual  normal  equations  is 

* 

readily  apparent.  However,  keep  in  mind  that  A^  depends  on  a  and  8  and 

thus  (2.18)  is  nonlinear  in  the  parameters. 

* 

Note  that  A.  is  also  a  sufficient  statistic  for  u.  when  a,  8,  and  oi 
x  1 

* 

are  fixed.  Because  of  (2.18)  and  the  fact  that  given  A^,  Y  is  normal 
X  *  ★ 

with  mean  a  +  8  A^,  A^  will  be  called  a  conjugate  sufficient  statis~ 

* 

tic.  Also  since  A^  is  the  functional  maximum  likelihood  estimator  for  u^ 

in  this  model  (Gleser,  1981),  equation  (2.18)  shows  that  the  functional 

maximum  likelihood  estimator  is  a  sufficiency  estimator. 

~T- 

From  (2.18)  it  follows  that  a  *  Y  -  8„X  and  using  this  it  is 

o  o 

possible  to  deduce  that  §s  satisfies 


-  vl*)v  j, Vi*  ■ 0 


(2.19) 


where  Y^  *  Y^  -  Y,  *  X^  -  X. 

Consider  (2.19)  for  the  case  p  »  1,  i.e.,  B„  is  a  scalar.  This  quad- 

d 

ratic  equation  has  two  real  roots  (Kendall  &  Stuart,  1979,  Chapter  29); 
unfortunately  the  sufficiency  principle  does  not  indicate  which  root  is 
appropriate.  Had  the  equations  (2.18)  been  derived  as  the  gradient  of  the 
functional  log-likelihood  the  appropriate  root  would  have  been  dictated  by 
the  maximizing  principle. 

In  the  previous  section  it  was  suggested  that  in  the  case  of  multiple 
solutions  to  (2.13)  and  (2.16)  to  pick  that  solution  closest  to  the  naive 
estimator  and  that  this  selection  rule  would  work  as  long  as  the  measure¬ 
ment  error  variance  was  small.  In  this  particular  case  the  two  roots  of 


•>  -.**  -V- 


KG' 


(2.19)  converge  to  8  and  — ff2 / ( B  t2),  where  t2  ■  Qff2  is  the  measurement 

o  o 

error  variance.  The  naive  estiaator  converges  to  o^Pq/Co^t2  )  where 
o2  is  the  limiting  value  of  the  sample  variance  of  the  true  u^'s.  Thus 
the  suggested  selection  rule  will  asymptotically  choose  the  right  root 
whenever 


1  - 


02  ♦  T2 


>2 

u 


B_  + 


o*  ♦  t2  "o  i2B 


This  inequality  is  satisfied  if 


2t2  <  o2  +  (a2 /B2 )  + 
u  o 


t  /  2\2  4O202 

i  (-i  *  -it  r  • 


*o 


The  inf imum  of  the  right  hand  side  above  with  respect  to  the  ratio  o2 /B^ 


is  2o2 .  Thus  whenever  t2  <  02  the  selection  rule  works  no  matter  what 
u  u 


the  values  of  02  and  B2 ;  however,  if  t2  >  02  and  o2 / B2  is  sufficient- 

o  u  o 


ly  small  then  the  selection  rule  chooses  the  wrong  root.  This  is  encour¬ 


aging  for  it  is  unusual  to  have  measurement  error  so  large  that  t2  £  0^. 


To  gain  some  additional  Insight  into  the  performance  of  the  suf¬ 
ficiency  estimator  suppose  that  u. . u  are  independent  normal  variates 

i  n 


with  mean  u  and  variance  a2,  i.e.  assume  a  structural  model.  In  this 
u  u 


case,  (Kendall  &  Stuart,  1979,  Chapter  29)  the  structural  and  functional 
maximum  likelihood  estimators  are  the  same,  and  in  light  of  the  previous 
discussion  this  common  estimator  is  also  a  sufficiency  estimator.  Thus  in 


this  particular  case  the  sufficiency  score  is  an  efficient  score. 


,T  * 


Finally  for  the  normal  model  E.(Y, |A, )  ■  a  ♦  B  A.  and  from  (2.15) 

0  i  i  1 


Vxi'V  -  v 


Thus  Hi  and  define  the  same  estimators,  i.e.,  9  ■  9  ,  when 

w  w  L>  l 


t<  «)»EJX|A-«). 

O 


Now  consider  logistic  regression  in  which 
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pre(Y-l|u)  -  F( a  ♦  »Tu),  F(t)  -  (1  + 

For  this  nodal  a($)  ■  1  and  m( • )  is  counting  measure  on  {0,1}. 
Using  (2.7)  one  obtains 

pr0(Y»l|A-<)  ■  F{a+(6-|QB)TB} 

and  corresponding  to  (2.12)  is  the  logistic  sufficiency  score 

i|>  (y,x,8)  «  [y  -  F{a+(6-^QB)TB}l  (  \ 

1  J  V«-QB'  6-x+yflB; 

and  setting  Z^s^i’*!’®)  ■  0  results  in  the  equivalent  equations 


(2.20) 


n  T  ic  /  \ 

I  {Y,  -  F(o  +  BA.)}  (  *)  -  0. 
i-1  1  1  'A*' 


(2.21) 


(2.22) 


where  A^  •  A^-£QB;  note  that  A^  is  a  conjugate  sufficient  statistic. 

Stefanski  &  Carroll  (1986)  introduced  this  estimator  and  show  in  a  Monte 

Carlo  study  that  in  spite  of  the  possibility  of  multiple  solutions  to 

~T  T 

(2.22),  a  modified  one-step  version  of  (“gt&g)  starting  from  the  naive 
estimator,  performed  well  in  some  realistic  sampling  situations.  Unlike 
the  normal  model  the  logistic  sufficiency  estimator  does  not  correspond  to 
the  functional  maximum  likelihood  estimator,  which  in  this  case  is  not 
consistent;  see  the  first  author's  University  of  North  Carolina  Ph.D. 
thesis  and  Stefanski  &  Carroll.  (1986).  In  Section  3.3  it  is  shown  that 
the  logistic  sufficiency  score  is  optimal  for  a  particular  structural 
model . 

For  logistic  regression  it  is  not  true  that  9g  *  8C>  when 
t(8)-E(X| A-d) .  Indeed  with  E^(Y)A»6)  given  by  (2.20), 

E(X|A-«)  -  «  -  F{a+(6-$8B)TB}fiB 
and  corresponding  to  (2.16)  are  the  equations 


Z  \\  - 

i-1  1 


a  +  B  A  )}|  *  yj  r  *  )  | 

'  ai  +  |2  "  F(°  +  8  A  * 


»  *■  •  «'•  A  .%  v'  • 

•Va’.n'VaVvVVVc''.  «*.  *. 
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The  final  model  considered  is  that  of  Poisson  regression  in  which 
prfl(Y»k|u)  ■  (k!)  *exp{k(«i+B*u)  -  exp(ct+B^u) } . 

For  this  model  a(4>)  ■  1  and  m(  • )  is  counting  measure  on  {0,1,...}. 

From  (2.7)  it  follows  that 


(kQ^expJkU+B1*)  -  k* B ! ftB/2} 

09 

l  (jP’^xp^jlo+B1*)  -  j2BTQB/2} 
j-0 


(2.23) 


Since  (2.23)  has  no  closed  form  the  sufficiency  scores  4>c  and  4>  are  quite 
messy  and  are  not  given.  Note  that  there  is  no  conjugate  sufficient 
statistic.  Also,  as  in  logistic  regression,  the  estimators  0  and  9 

i)  L< 

are  not  equal  for  this  model  when  t(i)»E(X|A*6) . 

The  conditional  distribution  (2.23)  is  more  typical  of  generalized 

linear  models  than  are  those  from  the  logistic  and  normal  models.  Since  in 

T 

(2.8)  the  factor  y2B  flB/2a($)  appears  in  the  exponent  it  is  only  in  special 
cases  that  the  denominator  of  (2.7)  can  be  obtained  in  closed  form.  Thus 
implementation  of  the  sufficiency  estimators  will  often  require  numerical 
integration  or  summation. 


3.  STRUCTURAL  MODELS 

3.1  The  structural  likelihood 

In  this  section  the  model  studied  is  the  structural  version  of  (1.1)  & 
(1.2),  ,  i.e.,  Uj , . . . ,u  are  independent  and  identically  distributed 
observations  with  unknown  density  g^(u).  Since  it  should  cause  no  con¬ 
fusion  the  subscript  U  on  gy(u)  is  omitted.  The  density  g  is  an  element  of 
G,  a  family  of  densities  with  respect  to  the  measure  v( • ) .  As  in  Section 
2,  it  is  assumed  that  M«1  along  with  the  identif lability  condition 
(2.1).  Under  these  conditions  the  joint  density  of  (Y,X)  is 


I  W  W  A  L^LWI  I m  1  H  WJ  i  m  nn 
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fY,X*y,X;e,g)  "  /hY  x(y,X’e,U,g<u)dv(u)  (3.1) 

where  hy  Y  is  defined  in  (2.2). 

Let  fy  X(y,x;0,g)  •  (9/30)fy  X(y,x;0,g)  and  assume  that 
fyx<y,x;0,g)  »  /hy  X(y,x;e,u)g(u)dv(u) 

where , 

hy  X(y,x;6,u)  *  (3/30)hy  X(y,x;6,u) 

i.e.,  assume  that  differentiation  and  integration  can  be  interchanged  in 
(3.1).  If  g( • )  were  known  then  the  efficient  score  for  0  would  be 


l( y.x,0,g) 


^Y.X(y’X’e*g) 
fY  X(y‘x;e,g)  ’ 


and  the  information  available  in  (Y,X)  for  estimating  0  would  be 

I  -  E(UT). 

Throughout  this  section  interest  lies  in  estimating  0  when  g  and  hence  J. 
are  unknown.  Note  that  both  the  sufficiency  and  conditional  scores  of 
Section  2  are  unbiased  for  the  structural  model  (3.1)  also.  Attention 
therefore  is  directed  to  the  problem  of  finding  efficient  score  functions. 


3.2  Efficient  score  functions  and  information  bounds. 

T  T 

Efficient  score  functions  for  estimation  of  0  ■  (a,B  ,0)  in  the 
presence  of  the  nuisance  function  g( • )  are  now  derived.  As  with  the 
theory  in  Section  2  the  existence  of  certain  sufficient  statistics  plays  a 
key  role  here.  The  derivation  draws  heavily  on  the  results  of  Begun,  Hall, 
Huang  &  Wellner  (1983);  see  also  Pfanzagl  (1982,  Chapter  14).  The 
structural  model  studied  here  is  a  generalization  of  a  model  considered  by 
Blckel  &  Ritov  (1986).  Whereas  they  study  simple  linear  regression  under  a 
number  of  conditions,  including  that  of  replicated  measurements  and  our 
assumption  (2.1),  we  consider  the  more  general  model  only  under  the  latter 
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assumption .  However,  the  approach  uaed  here  to  derive  efficient  acorea 
extends  quite  naturally  to  the  case  of  replicated  aeasureaents  when  (2.1) 
is  not  assuaed. 

In  the  following  p  denotes  the  product  aeasure  of  a(*)*  Lebesque 

measure  on  p~dimensional  Euclidean  space.  As  in  Begun  et  el.  (1983)  let 

L2(p)  and  L2(v)  denote  the  L2 -spaces  of  square-integrable  functions  with 

respect  to  the  measures  p  and  v  respectively.  Noras  and  inner  products  on 

these  spaces  are  denoted  by  R*||^  and  <*,*>^,  and  ||*||v  and 

The  theory  of  Begun  et  al.  (1983)  requires  Hel lingered!  fferenti- 

ability  of  the  square  root  of  (3.1)  with  respect  to  (0,g);  see  their 

Definition  2.1.  It  is  assuaed  here,  and  can  be  proven  under  regularity 

conditions,  that  f^  (y,x;0,g)  satisfies  condition  (2.1)  of  Begun  et 

al.  and  hence  is  Hel linger-dif ferentlable .  Its  differential ,  for  sequences 

(0n,gn)  satisfying  B»n-0||  +  Bg^“g^llv  converging  to  zero,  is  given  by 

pT(0  -9)  +  A(g*-g*) 
n  n 

where 

p  *  x(y*x;e'8>My»x,e,g) , 

and  the  linear  operator  A  taking  L2(v)  into  L2(p)  is  defined  for  T  in  L2(v) 
via 

/  h  Y(y,x;9,u)r(u)dv(u) 

Ar - -  . 

2f^  X(y.x;0,g) 

When  necessary  to  indicate  dependence  on  (y,x,9,g),  p  is  written  p(y,x,0,g) 
and  Ar  as  Ar<y,x,0,g). 

The  key  result  of  Begun  et  al.  (1983)  used  here  is  that  when  g  is 
unknown  the  efficient  score  for  0  is 

2<P  -  Af*) 


fy  X(y,x;0,g) 


My,x,0,g) 


(  3.2) 


wmr.'wjwvwvwm  wmwiwrrmvr WTO^WWWfP?WP  MMM  WWW  LW 1  u»  w  m 
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where  the  L2(v)  function  F  satisfiea 


<  p  -AT  ,  Ar>  -0 

9  ]i 


(3.3) 


for  all  functions  T  obtained  as  L2(v)  limits  of  sequences  (f  )  of  the  form 


n 


T  -  n^ 

n 


(gn-g)- 


In  terms  of  expectations  (3.3)  becomes 


(i)E„  .  rh(Y.X...,l  -  U  .  0 

811  f2  <Y,X;8,g)Hf2  (Y,X;B,g) 

I  j  A  I  |  A 


(3.4) 


Note  that  for  any  T  e  L2(v) 

ILYiX.e^ 

(Y,X;8,g) 


2AT( Y.X.e.g)  _  /  hYX(Y,x;e,u)r(u)du(u) 
/  hY  x(Y,X;6,u)g(u)dv(u) 


f^ 

Y,X 


and  thus  in  view  of  (2.4) 


2AT(Y.X.».«)  .  1  q(A.6.n)n<.)d*>») 
ft  V<Y.X,».,)  1  <I<4.»,»>8<»><lv(»> 


(3.5) 


Y,X' 


where  A  •  X  +  Y$)0  is  defined  in  (2.6).  The  important  fact  here  is  that  the 
right  hand  side  of  (3.5)  depends  on  (Y,X)  only  through  the  complete 
sufficient  statistic  A  irrespective  of  T;  this  is  a  consequence  of  the 
sufficiency  of  A  for  u,  when  u  is  regarded  as  a  parameter.  It  follows  now 
that  (3.3)  holds  for  all  T  when 

2 AT  ,1 Y | X tB ii»A  .  E  U(Y,X,e,g)|A>  . 
f2  x(Y,X;«,g)  ,8 

Thus  the  efficient  score  given  by  (3.2)  is 

jt(y,x,e,g)  -  i(y,x,9,g)  -  E  {jUY,X.«,g)tA-«}  |6,x4y(2e. 
and  the  "information"  available  in  (Y,X)  for  estimating  0  in  the 
presence  of  g  is 

i*  -  E{MY,x,e,gHT(Y,x,e,g>h  <3.6> 

see  Equation  (3.4)  of  Begun  at  al.  (1983). 


■  Vi 


Jk 


Png*  1  #* 

•k  . 

To  coapute  My,x,0,g)  lot  q  (0,u)  ■  q{<( y ,x,0) ,0 ,u>  where  q(«,0,u)  and 
<(y,x,0)  are  given  in  (2.5).  Then  uaing  (2.4) 


f Y  X(y.x;0,g)  ■  /  q  (0,u)r(y,x,0)g(u)dv(u) , 


and  thus 


♦  <1 


My,x,0,g) 


Now  since 


* 

/  q  rgdv 


(J*  gdv  i£ 

f  308  30 

- - -  +  - 

/  q  gdv  r 


-  b*  (q+u 
a(0) 


★ 

q  (0,u) 


(  uTQ  *u  +  2b(q+uT8)  -  2uTQ  *0(y.x.0) 

\  2IH7) 


}«'<♦). 


t  Ay, x,*) 


r(y,x,0) 


where 


r^(y,x,0) 


jCJJUli  .  j  2ay  -  xVjx 

3*  (  2a*(4) 


jl(y,x,0,g)  can  be  written  as 


*  y  -  f^i.a.g) 

jUy,x,0,g)  -  a-1  (0)  yR( 4 ,0 ,g)  -  f2(«,0,g) 


a(0)r  (y,x,0)  -  f  A  6 , 0  ,g)  I. , „ 
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where  the  scalar-valued  functions  and  f^,  and  the  p-vec tor-valued 
functions  R  and  f ^  depend  on  (y,x)  only  through  4  •  x+yQ$.  It  follows  that 

My.x.e.g)  -  My.x.e.g)  -  E(juY,x,0,g)  |a-4} 


{y  -  E(Y|A-«U/a(*) 

{y  -  E(Y|A-4))R(4,e,g) 

r.(y,x,0)  -  E(r. (Y,X,0)|A«4> 

9  9 


<  3.  7» 


J4«x+yQB . 


Define 


w(y)  •  /  q(y ,  0 , u)g(u)dv(u) ;  (3.8) 

and 

w(y)  ■  (3/3y)w(y)  -  ^~y  /  Q  ^uq(  y  ,0,u)g(u)dv(u) ; 

Now  the  function  R(4,0,g)  appearing  in  (3.7)  is  given  by 


R(«,6,g) 


0w(4) 
w(4)  • 


Using  the  relation  x  »  4  -  yftB 


r^(y,x,0) 


jHLLXiU  . 
H 


(3.9) 


and  thus  (3.7)  involves  only  expectations  of  functions  of  Y|A*4. 


3.3.  Efficiency  of  the  sufficiency  and  conditional  scores 
in  a  structural  setting 

In  the  discussion  of  the  normal  linear  functional  model  in  Section 
2.3,  it  was  deduced  that  the  sufficiency  score  is  equivalent  to  the 
efficient  score  for  the  structural  version  of  this  model  when  the  true 
predictors  (Uj^.-.u^)  are  themselves  normally  distributed.  A  similar 
result  for  logistic  regression  is  now  derived.  Compare  (2.21)  to  the 
logistic  efficient  score  given  by 

•  l  G-^OB)TB}  J  Jj.jt.yQB; 


jKy.x,«,g) 


( 3.10) 


equation  (3.10)  la  just  (3.7)  for  the  case  of  logistic  regression.  For 
(2.21)  and  (3.10)  to  be  equivalent  the  function  R(4,6,g)  nust  be  linear  In 
6.  Since  by  (3.9) 


this  means  that  log{w(6)}  must  be  a  quadratic  form  in  4,  call  it  Q(4), 
i.e.,  using  (3.8)  with  q(4,8,u)  chosen  accordingly  for  logistic  regression, 

exp{Q(4)}  »  /exp(uTQ-14-  U  ) - 'I —  du. 

l+exp(a+8  u) 

Now  using  a  moment-generating-characteristic-function  argument  it  follows 
that 


T0-i 
u  (2  u 


_ g(u.2_ 

T 

l+exp(<*+$  u) 


must  be  proportional  to  a  p-variate  normal  distribution.  This  means  that 
g(u)  must  be  a  mixture  of  two  p-variate  normal  distributions  with  different 
means  and  common  covariance  matrix.  The  picture  is  now  clear;  the  suf¬ 
ficiency  score  (2.21)  is  efficient  in  a  structural  setting  only  when  (Y,U) 
satisfy  the  assumptions  of  the  normal  discrimination  model, 

pr(Y*l)«*1,  U| Y-y~N(yy,¥) . 

Of  course  if  all  of  this  information  were  known  a  priori  then  the  linear 
T 

discriminant,  a+$  u,  would  most  likely  be  estimated  using  the  full  likeli¬ 
hood  as  opposed  to  using  logistic  regression,  see  Efron  (1975)  and  Nichalik 
4  Tripathi  (1980). 

A  theorem  is  now  proved  which  indicates  when  the  conditional  score  i|> 
defined  in  (2.14)  is  the  efficient  score  in  some  structural  setting.  This 
provides  some  Insight  into  appropriate  choices  for  t(*)  when  choosing  a 
conditional  score  (2.14). 
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THEOREM.  The  conditional  score  <|>  is  the  efficient  score  in  a  structural 

v 

model  for  some  density  g(  ■ )  and  some  measure  v(  *)  if  and  only  if  there 
exists  a  real-valued  function  T( • )  such  that  t(6)»(3/34)T(6) 
where  exp(T(a($)ft5H  is  a  moment  generating  function  for  some  probability 
density  with  respect  to  v(  • ) . 


PROOF.  Assume  that  is  the  efficient  score  in  a  structural  model  with 
density  g( • )  and  measure  v( • ) .  Then  comparing  (2.14)  and  (3.7)  it  follows 
that 


t(«) 


wiii 

w(  tf  ) 


where  w(fi)  is  given  by  (3.8).  Let  T(«)  •  log{w(6)}-k  for  a  constant  k  to 
be  determined  later.  Clearly  (3/3«)T(« )-t(6)  and  furthermore,  using  (3.8), 
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(2.14),  which  ia  known  to  be  efficient  for  some  structural  model.  The 
theorem  indicates  that  the  class  of  appropriate  functions  t( • )  is  fairly 
restricted . 


3.4.  Efficient  estimation. 

Since  the  efficient  score  in  (3.7)  depends  on  the  unknown  density 

g( • ) ,  it  is  not  readily  apparent  how  one  constructs  a  sequence  of 

estimators  with  asymptotically  minimum  variance.  Begun  et  si.  (1983) 

suggest  in  general  solving 

n 

l  JUY  ,X  ,6,g)  -  0  (3.11) 

i-1 

where  g( • )  is  some  suitable  initial  estimator  of  g( • ) .  Since  the 
empirical  distribution  function,  F^,  of  the  observed  X's  converges  to 
the  convolution  of  G  with  a  normal  distribution  function  it  should  in 
theory  be  able  to  deconvolute  to  obtain  consistent  estimators  of  G 
which  would  then  be  smoothed  to  obtain  estimators  of  g( • ) .  In  practice 
this  is  quite  difficult  and  technical  problems  might  arise  when  p>l.  Also 
given  a  g(  • )  it  is  still  possible  that  (3.11)  will  have  multiple 
solutions,  not  all  yielding  consistent  sequences. 

This  last  problem  can  be  avoided  if  a  root-n  consistent  preliminary 
estimator,  0,  is  available.  Again  let  g( • )  be  an  estimator  of  g( • ) 
and  define 


0-0 


~-l  n 

♦  I*  n  7  I  JMY.  ,X  ,0, 
i-1  1  1 


g) 


where  1^,  is  an  estimator  of  I-,  e.g., 


I*  -  -  n"1  l  i(Y  ,X.,0,g) 
i-1  1  1 


i(y,x,e,g)  »  o/ae>jUy,x,e,g). 

Then  0  will  generally  be  asymptotically  efficient  provided  and  g(  • ) 
are  good  estimates  of  1^  and  g( ■ )  respectively.  This  approach  still 
requires  an  estimator  g( • )  of  g( • ) . 

Note  that  Jt(y,x,8,g)  depends  on  g(  • )  only  through  the  function  w<  •  )  in 
(3.8)  and  its  derivative.  In  work  in  progress  the  authors  are  investi¬ 
gating  a  one-step  construction  of  an  asymptotically  efficient  estimator 
which  estimates  w( • )  directly,  avoiding  the  intermediate  step  of  estimating 
g(  • )  • 

4.0  CONCLUDING  REMARKS 

In  conclusion  we  reiterate  that  the  assumption  of  normal  errors, 

(1.2),  is  not  crucial  to  the  theory  developed  herein;  the  existence  of  a 
complete  sufficient  statistic  for  u  when  regarded  as  a  parameter  is 
crucial.  The  situation  in  which  (2.1)  is  replaced  with  an  assumption  of 
replicated  measurements,  i.e.,  m  >  1  in  (1.2),  is  conceptually  no  different 
than  when  (1.2)  is  assumed  with  the  exception  that  both  Q  and  4  can  now 
be  estimated;  thus  there  will  be  an  additional  p(p+l)/2  -  dimensional 
component  to  all  the  scores. 

Although  no  distributional  assumptions  on  the  measurement  errors  is 
more  reasonable  than  that  of  normality  it  is  still  an  unverifiable  as¬ 
sumption  unless  replicate  measurements  are  made.  The  sufficiency,  con¬ 
ditional  and  efficient  scores  lose  their  unbiasedness  when  the  assumption 
of  normal  errors  is  erroneous.  Thus  when  measurement  error  is  nonnormal, 
estimates  derived  from  these  scores  will  generally  be  biased  and  the  bias 
will  generally  not  be  computable.  Approximations  to  the  bias  can  probably 
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